Three-dimensional conditional generative adversarial network-based virtual thin-slice technique for the morphological evaluation of the spine

Virtual thin-slice (VTS) technique is a generative adversarial network-based algorithm that can generate virtual 1-mm-thick CT images from images of 3–10-mm thickness. We evaluated the performance of VTS technique for assessment of the spine. VTS was applied to 4-mm-thick CT images of 73 patients, and the visibility of intervertebral spaces was evaluated on the 4-mm-thick and VTS images. The heights of vertebrae measured on sagittal images reconstructed from the 4-mm-thick images and VTS images were compared with those measured on images reconstructed from 1-mm-thick images. Diagnostic performance for the detection of compression fractures was also compared. The intervertebral spaces were significantly more visible on the VTS images than on the 4-mm-thick images (P < 0.001). The absolute value of the measured difference in mean vertebral height between the VTS and 1-mm-thick images was smaller than that between the 4-mm-thick and 1-mm-thick images (P < 0.01–0.54). The diagnostic performance of the VTS images for detecting compression fracture was significantly lower than that of the 4-mm-thick images for one reader (P = 0.02). VTS technique enabled the identification of each vertebral body, and enabled accurate measurement of vertebral height. However, this technique is not suitable for diagnosing compression fractures.


Materials and methods
This retrospective study was approved by the Osaka University Clinical Research Review Committee, and the requirement for informed consent was waived by the Osaka University Clinical Research Review Committee. All methods were carried out in accordance with relevant guidelines and regulations. Patients who underwent CT for evaluation of aortic or cardiac disease were eligible for inclusion in this study because we obtained a single scan in one breath-hold from the supraclavicular area to the symphysis pubis in these patients, whereas separate scans were obtained for the chest and abdominopelvic regions in other patients. Enrolled were 73 consecutive patients who underwent CT between January and February 2019 or between December 2020 and January 2021 (50 men and 23 women; age range, 25-91 years; mean age, 72.9 years). The clinical indications for CT in these patients are listed in Table 1. CT examination. CT was performed using a 160-or 320-slice CT scanner (Aquilion Precision, Canon Medical Systems, Otawara, Japan, n = 34, or Aquilion ONE GENESIS Edition, Canon Medical Systems, n = 39). A pre-contrast scan was performed in all patients from the supraclavicular area to the symphysis pubis during a single breath hold. Tube current was adjusted individually using an auto-exposure control technique with a standard deviation setting of 15. The remaining scan parameters were as follows: tube voltage, 120 kVp; rotation time, 0.5 s; helical pitch, 0.83. Although post-contrast scans were also acquired in 31 patients, only the precontrast images were used in this study.
From the raw data of each patient, two sets of axial images were reconstructed, with a slice thickness/interval of 4/4 and 1/1 mm. A hybrid iterative reconstruction algorithm (AIDR 3D, Canon Medical Systems) with a weak strength setting was applied. The remaining reconstruction parameters were as follows: kernel, FC03; reconstruction field of view, 350 mm (pixel size, 0.68 × 0.68 mm).
Virtual thin-slice technique. VTS is a conditional-GAN based algorithm. Thick-slice images with slice thickness/intervals of 3-10 mm were randomly simulated from real thin-slice images by down-sampling with Gaussian smoothing. A pair of original thin-slice images and simulated thick-slice images were used to train the VTS generator in the GAN framework (Fig. 1). The generator is an encoder-decoder type architecture with skip connections inspired by U-Net to reconstruct high resolution images. The role of the discriminator is to enable the generator to output virtual thin-slice images that are hard to distinguish from real ones. Both the generator and the discriminator are composed of 3D Convolutional Neural Networks. The conditioning labels (e.g. slice interval) associated with input thick images are fed into the discriminator to improve the accuracies of super resolution. While generator training, L1 loss was calculated in addition to adversarial loss, to minimize the pixel-wise intensity difference between the original (ground truth) and the generated thin-slice images, as these should be as close as possible. VTS software is a function of the PACS viewer (SYNAPSE SAI Viewer Version Qualitative analysis. Two radiologists familiar with abdominal radiology (9 and 6 years' experience) independently reviewed the sagittal images reformatted from 4-mm-thick images and the VTS images and evaluated the visibility of the intervertebral spaces in each of four regions: cervical, upper thoracic, lower thoracic, and lumbar spine. They reviewed these images on a commercially available workstation (SYNAPSE VINCENT version 5.3.001, FUJIFILM), and assigned a score using the following 4-point scale: 4, all intervertebral spaces are visible; 3, most intervertebral spaces are visible but some are unclear; 2, most intervertebral spaces are unclear; 1, no intervertebral spaces are visible. The radiologists were informed that the images for evaluation were either 4-mm-thick or VTS images, but were blinded to the patients' identity, medical background, and the reconstruction protocol used.
Quantitative analysis. Two radiologists familiar with abdominal radiology (16 and 9 years' experience), different to the radiologists who performed the qualitative assessment, independently measured the height of the first thoracic (Th1) and first lumbar (L1) vertebrae on sagittal reformatted images made from each of the 4-mm-thick, true 1-mm-thick, and VTS data sets. Height was measured at the anterior border of each of these vertebrae. The absolute values of the difference between the measured heights on the 4-mm-thick and true 1-mm-thick images (D 1 ) were calculated, as well as the absolute values of the difference between the measured heights on VTS and true 1-mm-thick images (D 2 ). The absolute percentage errors between the measured heights on the 4-mm-thick and true 1-mm-thick images (%Error 1 ) was also calculated by dividing D 1 by the measured height on true 1-mm-thick images, as well as the absolute percentage errors between the measured heights on VTS and true 1-mm-thick images (%Error 2 ). Measurements were performed using a workstation (SYNAPSE VINCENT version 5.3.001).
Diagnostic performance in detecting compression fracture. The same two radiologists who performed the qualitative assessment also independently evaluated the possible presence of compression fracture using the sagittal reformatted images constructed from each of the 4-mm-thick images and the VTS images. They classified the likelihood of compression fracture in all vertebrae using the following 4-point confidence score scale: 1, probably no fracture present; 2, indefinite presence of fracture; 3, fracture probably present; and 4, fracture definitely present. Before the assessment, they were informed that a confidence level of 3 or 4 would be considered a positive finding for the calculation of sensitivity and positive predictive value (PPV). The criteria for compression fracture used in this study were: 1, ratio of the anterior height of the vertebra (AH) to the posterior height (PH) < 0.75; 2, ratio of the central height of the vertebrae (CH) to AH or PH < 0.8; 3, height of a vertebra reduced by > 20% compared with those above and below 15 . The reference standard was determined by two other radiologists (16 and 9 years' experience) who evaluated the presence or absence of compression fracture on sagittal images reformatted from the true 1-mm-thick images, in consensus.

Statistical analysis. Visual scores regarding the visibility of intervertebral spaces were compared using
Wilcoxon signed rank test. The absolute values of the difference in measured vertebral heights (D 1 and D 2 ) were compared using paired t-test. The absolute percentage errors of the measured vertebral heights (%Error 1 and %Error 2 ) were also compared using paired t-test. Interobserver agreement for each of D 1 and D 2 was evaluated by intraclass correlation coefficient (ICC). To analyze diagnostic performance for detecting compression frac- , which is defined as the probability that a lesion is rated higher than the highest rated non-lesion on a normal image 16 . In the present study, JAFROC1 was used rather than JAFROC or JAFROC2 because of its high statistical power for human observers 17 . For all tests, a P value less than 0.05 was considered significant.

Results
Qualitative analysis. Mean visual scores regarding the visibility of intervertebral spaces are summarized in Table 2. The mean score was significantly higher for VTS than for 4-mm-thick images for all regions for both readers (P < 0.001) (Fig. 2).
Quantitative analysis. The mean measured heights of Th1 and L1 are summarized in Table 3. The mean absolute value of the difference in measured height between VTS and true 1-mm-thick images (D 2 ) was less than that between 4-mm-thick and true 1-mm-thick images (D 1 ) for both readers, and the difference was significant for L1 measured by Reader 1 (P < 0.01). The mean absolute percentage error of the measured height between VTS and true 1-mm-thick images (%Error 2 ) was smaller than that between 4-mm-thick and true 1-mm-thick images (%Error 1 ) for both readers, and the difference was significant for L1 measured by Reader 1 (P < 0.01). The ICCs of the two readers for height measured on the 4-mm-thick images were 0.461 and 0.795 for Th1 and L1, respectively, whereas those measured on VTS were 0.524 and 0.813 for Th1 and L1, respectively.   Table 4. Some compression fractures that were correctly diagnosed by both readers on 4-mm-thick images were missed on VTS (Figs. 3, 4). Sensitivity, positive predictive value, and FOM were lower for the VTS images than for the 4-mm-thick images in both readers, and the difference was statistically significant for FOM for Reader 1 (P = 0.02) (Fig. 5). Table 3. Measured heights of thoracic and lumbar vertebrae. Data are mean ± standard deviation (mm, except for %Error 1 and %Error 2 ). VTS virtual thin slice, Th1 first thoracic vertebra, L1 first lumber vertebra, D 1 absolute value of the difference between measured heights on thick slice images and those on thin slice images, D 2 absolute value of the difference between measured heights on VTS images and those on thin slice images, %Error 1 absolute percentage error between measured heights on thick slice images and those on thin slice images, %Error 2 absolute percentage error between measured heights on VTS images and those on thin slice images.   Figure 3. Sagittal reformatted images reconstructed from 1-mm-thick images (a), 4-mm-thick images (b), and virtual thin-slice images (c). A compression fracture of the 8th thoracic vertebra is seen on the reconstruction from 1-mm-thick images (arrow), but is not depicted on that from virtual thin-slice images.

Discussion
GAN is a type of deep learning model capable of generating realistic-looking fake images 8 . In recent years, GANs have been used for various radiology applications, such as in noise reduction of CT images 9-11 , augmentation of data for deep learning algorithm training 18 , and for generating images of different modalities 12,13 . VTS is a newly developed GAN-based algorithm that can generate virtual images of 1-mm thickness from thick-slice CT images of 3-10-mm thickness. Although VTS can be applied to any part of the body, it is considered to be more effective in high-contrast regions such as bone 14 . Thus, we conducted the present study to investigate the utility of VTS for morphological evaluation of the spine. The results of our qualitative analysis demonstrated that the visibility of intervertebral spaces was higher on sagittal reformatted images created from VTS images than on reformatted images made from 4-mm-thick images, for all spinal regions. The intervertebral spaces of the cervical and upper thoracic spine were hardly visible on reformatted images made from the 4-mm-thick images (mean score, The fracture of the 10th thoracic vertebra can be seen on the reconstruction from 1-mm-thick images (arrow), and is also identifiable on that from the 4-mm-thick images. However, it is barely visible on that from the virtual thinslice image, and is therefore difficult to diagnose. www.nature.com/scientificreports/ 1.0-2.0), but visibility was improved on reformatted images made from VTS images (mean score, 2.5-2.6). Thus, reformatted images made from VTS images would make it easier to recognize and number individual vertebral bodies, and thus identify the vertebral level of a lesion. Moreover, this technique would improve the quality of reformatted or 3D images, and it would make it easier to obtain an overview of the whole spine.
The absolute values of the differences in measured heights of thoracic and lumbar vertebrae between VTS and true 1-mm-thick images were smaller than those between 4-mm-thick and true 1-mm-thick images, and interobserver agreement was slightly improved using VTS compared with 4-mm-thick images. Therefore, reformatted VTS images might have the potential to achieve more accurate measurement of bone structures compared with reformatted images using 4-mm-thick images. However, the height tends to be underestimated when using VTS, and it remains unclear whether VTS can be used for quantitative evaluation instead of thin-slice images. Further studies will be necessary to confirm the usefulness of VTS images in quantitative evaluation.
Regarding diagnostic ability for compression fracture, VTS had impaired performance compared with 4-mmthick images, for the reason that slight compression fractures were sometimes not depicted correctly by VTS images (Fig. 3). If localized mild depressions suggestive of a mild compression fracture are not visible on the 4-mm-thick images, then the VTS images generated from these images are also unlikely to contain such information. Moreover, some compression fractures were depicted less definitively on the VTS images than the 4-mmthick images (Fig. 4). If a mild compression fracture is located near the boundary between two adjacent 4-mmthick images, it might be recognizable on the reformatted 4-mm-thick images. However, in the process of VTS generation, there might be a tendency to make the morphology of vertebral bodies closer to normal vertebrae, which might obscure such a slight compression fracture. This might be the reason for the impaired diagnostic performance. VTS was originally developed for purposes such as improving the visibility of the vertebral bodies, and not for the diagnosis of lesions such as compression fractures. Thus, as indicated by the present results, the current VTS technique would not be suitable for the evaluation of subtle abnormalities. Although training using more cases, including those with compression fractures, might improve the diagnostic ability of VTS, it is unclear whether it is really possible for the trained algorithm to accurately delineate subtle lesions. Further improvement and validation will be necessary before VTS can be used for the purpose of diagnosing lesions. Our results would suggest that virtual images generated by a GAN would not always accurately depict pathological abnormalities, and this might be also true for other types of GANs, such as noise reduction, super-resolution, and generating images of different modalities. Thus, these virtual images would need to be validated before use for diagnostic purposes in routine clinical practice.
Our study had several limitations. First, this was a retrospective study, and the number of patients was relatively small. Second, although thick-slice images of 3-10-mm thickness can be applied to VTS software, we evaluated only 4-mm-thick images. Because thick-slice images with thickness 4 or 8 times that of the thinslice images were used when training the VTS 14 , it was considered that images with a thickness of 4 mm were the most suitable for this software. Third, the reference standard for compression fractures was determined by consensus reading of the true 1-mm-thick CT images, and other diagnostic modalities such as MR imaging were not performed. As the purpose of this study was to investigate whether VTS images could be used as a substitute for true thin-slice images, it was appropriate to use the thin-slice images as the gold standard.

Conclusions
Virtual thin-slice technique enabled the identification of all vertebral bodies and more accurate measurement of vertebral height compared with thick-slice images, but is not suitable for the detection of compression fractures. Further improvements are needed before virtual thin-slice images can achieve the same diagnostic performance as true thin-slice images for detecting lesions.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.